The Plant Genome
○ Wiley
All preprints, ranked by how well they match The Plant Genome's content profile, based on 53 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Tomura, S.; Powell, O. M.; Wilkinson, M. J.; Cooper, M.
Show abstract
While various genomic prediction models have been evaluated for their potential to accelerate genetic gain for multiple traits, no individual genomic prediction model has outperformed all others across all applications. As an alternative approach, ensembles of multiple individual genomic prediction models can be applied to utilise the complementary strengths of individual prediction models and offset the prediction errors of each. We used the EasiGP (Ensemble AnalySis with Interpretable Genomic Prediction) pipeline to investigate the performance of an ensemble approach, targeting flowering-time traits measured in two maize nested association mapping datasets. For both datasets, the ensemble-based prediction approach achieved higher prediction accuracy and lower prediction error across the flowering-time traits compared to each individual model. Multiple genomic regions known to contain key flowering-time related genes were repeatedly included as features across individual genomic prediction models, indicating the models successfully captured SNPs as features that are associated with genomic regions known to contain flowering-time genes. Although repeatability was high for some genomic regions, estimated marker effects varied across many genomic regions, suggesting that the models might also have captured different aspects of the genetic variation underlying the traits. The ensemble combination of the diverse views likely contributed to the improvement of prediction performance by the ensemble-based approach over the individual prediction models. Ensemble-based prediction can be applied to overcome limitations observed in the continuous exploration for the best individual genomic prediction models that can consistently achieve the highest prediction performance, thereby potentially contributing to improved prediction accuracy for applications in crop breeding. Article summaryThis study targets researchers interested in the performance of genomic prediction models. To demonstrate potential advantages of an ensemble of diverse individual genomic prediction models, we investigated the prediction of key flowering-time traits (days to anthesis and anthesis to silking interval) in two maize datasets. The ensemble approach consistently improved the prediction performance. The improvement was attributed to the offset of prediction errors by combining multiple different dimensions of trait genetic variation. Ensembles can lead to higher selection accuracy of desirable individuals for applications in crop breeding.
Orhobor, O. I.; Alexandrov, N. N.; Chebotarov, D.; Kretzschmar, T.; McNally, K. L.; Sanciangco, M. D.; King, R. D.
Show abstract
To secure the worlds food supply it is essential that we improve our knowledge of the genetic underpinnings of complex agronomic traits. In this paper, we report our findings from performing trait prediction and association mapping using marker stability in diverse rice landraces. We used the least absolute shrinkage and selection operator as our marker selection algorithm, and considered twelve real agronomic traits and a hundred simulated traits using a population with approximately a hundred thousand markers. For trait prediction, we considered several statistical/machine learning methods. We found that some of the methods considered performed best when preselected markers using marker stability were used. However, our results also show that one might need to make a trade-off between model size and performance for some learning methods. For association mapping, we compared marker stability to the genome-wide efficient mixed-model analysis (GEMMA), and for the simulated traits, we found that marker stability significantly outperforms GEMMA. For the real traits, marker stability successfully identifies multiple associated markers, which often entail those selected by GEMMA. Further analysis of the markers selected for the real traits using marker stability showed that they are located in known quantitative trait loci (QTL) using the QTL Annotation Rice Online database. Furthermore, co-functional network prediction of the selected markers using RiceNet v2 also showed association to known controlling genes. We argue that a wide adoption of the marker stability approach for the prediction of agronomic traits and association mapping could improve global rice breeding efforts.
Rebollo, I.; Tolhurst, D.; Obsteter, J.; Rosas, J. E.; Gorjanc, G.
Show abstract
Rice (Oryza sativa L.) has two main subspecies, indica and japonica, which coexist in many regions but are often treated separately during breeding. Combining both subspecies in quantitative genetic analyses could enhance genetic improvement, however, this requires appropriately modelling their genetic history. The ancestral recombination graph (ARG) is an effective population genetics tool that comprehensively and succinctly represents a species genetic history. This study evaluated the use of an ARG, encoded as a tree sequence, to improve quantitative genetic analyses of indica and japonica rice. Using data from Uruguays National Rice Breeding Program, we inferred ancestral alleles, constructed and dated an ARG, and examined its application in genomic prediction and genome-wide association studies. We compared the predictive ability of a branch-based relationship matrix (BRM) built from an ARG against conventional relationship matrices from pedigree and single nucleotide polymorphism (SNP) site data. We then estimated BRMs SNP site effects to identify potential sites of interest and better understand how these map onto the tree sequence branches. The results showed that the ARG captured key biological signals, encoded genomic data more efficiently than conventional formats, and resulted in the highest predictive ability when combining both subspecies. Although the ARG-based approach did not substantially outperform conventional approaches for between-species prediction, this approach holds promise for plant breeding with larger datasets and could enhance genome-wide association studies by elucidating haplotype ancestry and the evolution of their value. Overall, our results demonstrated the potential of ARGs for the quantitative genetic analysis of diverse populations.
Godoy, J. C.; Edwards, J.; Lee, E. C.; Mikel, M. A.; Fernandes, S. B.; Hirsch, C. N.; Berry, S. P.; Lipka, A. E.; Bohn, M. O.
Show abstract
The early 20th-century discovery of heterosis and the establishment of heterotic groups transformed maize (Zea mays L.) into a keystone of global agriculture. However, maize breeding faces two significant challenges: the gradual decline of general combining ability (GCA) variance within heterotic groups and the impracticality of testing all possible single crosses in the early stages of a breeding program. Here, we developed genomic best linear unbiased prediction (GBLUP)-based multi-kernel models, using additive and two alternative non-additive genomic relationship matrices, to estimate the variance components associated with the GCA of Stiff Stalk (SS) and Non-Stiff Stalk (NSS) heterotic groups and the specific combining ability (SCA) arising from their crosses. We further applied these models to predict the performance of untested single-cross combinations under varying levels of parental information. We showed that the SS and NSS groups retained significant GCA variance across traits in both early- and late-maturity groups. The SS group, in contrast, exhibited no detectable GCA variance in grain yield for the intermediate-flowering subset of hybrids, highlighting a limitation for future genetic improvement. Furthermore, our results showed that GBLUP-based multi-kernel models effectively identified superior hybrids when parental information was available. In the absence of this information, however, these models underperformed compared to covariance-based approaches. Both types of non-additive matrices produced similar results, affirming the robustness of the inferred genetic architecture. Overall, this study sheds light on the future use of US maize commercial germplasm and demonstrates how GBLUP-based multi-kernel models can improve the efficiency of hybrid breeding programs.
Vidigal, P. M. P.; Momen, M.; Costa, P. M. A.; Barbosa, M. H. P.; Morota, G.; Peternelli, L. A.
Show abstract
BackgroundThe identification of genomic regions involved in agronomic traits is the primary concern for sugarcane breeders. Genome-wide association studies (GWAS) leverage the sequence variations to bridge phenotypes and genotypes. However, their effectiveness is limited in species with high ploidy and large genomes, such as sugarcane. As an alternative, a regional heritability mapping (RHM) method can be used to capture genetic signals that may be missed by GWAS by combining genetic variance from neighboring regions. We used RHM to screen the sugarcane genome aiming to identify regions with higher heritability associated with agronomic traits. We considered percentage of fiber in sugarcane bagasse (FB), apparent percentage of sugarcane sucrose (PC), tonnes of pol per hectare (TPH), and tonnes of stalks per hectare (TSH). MethodsSequence-capture data of 508 sugarcane (Saccharum spp.) clones from a breeding population under selection were processed for variant calling analysis using the sugarcane genome cultivar R570 as a reference. A set of 375,195 single nucleotide polymorphisms were selected after quality control. RHM was conducted by splitting the sugarcane genome into windows of 2 Mb length. ResultsWe selected the windows explaining > 20% of the total genomic heritability for TPH (64 windows - 5,654 genes) and TSH (72 windows - 6,050 genes), and > 15% for PC (16 windows - 1,517 genes) and FB (17 windows - 1,615 genes). The top five windows that explained the highest genomic heritability ranged from 20.8 to 24.6% for FB (629 genes), 18.0 to 22.0% for PC (452 genes), 53.8 to 66.0% for TPH (705 genes), and 59.5 to 67.4% for TSH (413 genes). The functional annotation of genes included in those top five windows revealed a set of genes that encode enzymes that integrate carbon metabolism, starch and sucrose metabolism, and phenylpropanoid biosynthesis pathways. ConclusionsThe selection of windows that explained the large proportions of genomic heritability allowed us to identify genomic regions containing a set of genes that are related to the agronomic traits in sugarcane. These windows spanned a region of 58.38Mb, which corresponds to 14.28% of the reference assembly in the sugarcane genome. We contend that RHM can be used as an alternative method for sugarcane breeders to reduce the complexity of the sugarcane genome.
Zhan, S.; Raherison, E.; Hargreaves, W.; Hughes, N.; Goessen, R.; Majidi, M. M.; Knox, R.; Cuthbert, R.; Lukens, L.
Show abstract
BackgroundGenetic variation of regulatory alleles plays a key role in evolution and breeding. In polyploids, regulatory differences may preferentially affect genes on homoeologous chromosomes or sub-genomes. Selection in plant breeding may act upon total transcript dosage across homoeologous genes and on alleles that have strong effects on the transcriptome. ResultsTo investigate these questions, we identified regulatory polymorphisms between an old and a recent hexaploid bread wheat cultivar (Triticum aestivum, 2n=6x=42, AABBDD). The recent cultivar was the product of decades of selection for grain yield and quality. Regulatory allele polymorphisms preferentially affected genes on homoeologous chromosomes but rarely affected genes on specific sub-genomes. The chromosomal distributions of regulatory alleles indicated that past selection had acted upon them, and the effect of selection differed between alleles targeting environmental response genes and genes involved in other processes. Modern cultivar alleles that affected many genes transcripts corresponded to known selection targets and improved field crop performance. Modern cultivar alleles also had significant effects on homoeologous genes, and these alleles also improved crop performance. ConclusionsPolyploid breeding across many species has been and will continue to be the key factor in plant improvement. By enhancing the favorability of strong regulatory alleles and by expanding the range of gene transcript abundances, genome duplications enable breeding progress.
Ramstein, G. P.; Larsson, S. J.; Cook, J. P.; Edwards, J.; Ersoz, E. S.; Flint-Garcia, S.; Gardner, C. A.; Holland, J. B.; Lorenz, A. J.; McMullen, M. D.; Millard, M. J.; Rocheford, T. R.; Tuinstra, M. R.; Bradbury, P.; Buckler, E. S.; Romay, M. C.
Show abstract
Heterosis has been key to the development of maize breeding but describing its genetic basis has been challenging. Previous studies of heterosis have shown the contribution of within-locus complementation effects (dominance) and their differential importance across genomic regions. However, they have generally considered panels of limited genetic diversity and have shown little benefit to including dominance effects for predicting genotypic value in breeding populations. This study examined within-locus complementation and enrichment of genetic effects by functional classes in maize. We based our analyses on a diverse panel of inbred lines crossed with two testers representative of the major heterotic groups in the United States (1,106 hybrids), as well as a collection of 24 biparental populations crossed with a single tester (1,640 hybrids). We assayed three agronomic traits: days to silking (DTS), plant height (PH) and grain yield (GY). Our results point to the presence of dominance for all traits, but also among-locus complementation (epistasis) for DTS and genotype-by-environment interactions for GY. Consistently, dominance improved genomic prediction for PH only. In addition, we assessed enrichment of genetic effects in classes defined by genic regions (gene annotation), structural features (recombination rate and chromatin openness), and evolutionary features (minor allele frequency and evolutionary constraint). We found support for enrichment in genic regions and subsequent improvement of genomic prediction for all traits. Our results point to mechanisms by which heterosis arises through local complementation in proximal gene regions and suggest the relevance of dominance and gene annotations for genomic prediction in maize.
You, F. M.; Zheng, C.; Zagariah Daniel, J. J.; Li, P.; Jackle, K.; House, M.; Tar'an, B.; Cloutier, S.
Show abstract
Genomic selection (GS) is a promising strategy to improve breeding efficiency for complex traits such as seed yield by enabling early selection and reducing reliance on extensive field testing. However, practical deployment of GS remains challenging due to limited training populations sizes and reduced prediction accuracies when models are applied to true breeding germplasm. In this study, we evaluated GS for flax (Linum usitatissimum L.) seed yield under realistic breeding scenarios, with a focus on across-population prediction (APP) and breeding decision support rather than model benchmarking. Using historical germplasm collections and a newly developed breeding-oriented population as training sets, GS performance was assessed across multiple independent test populations representing contemporary breeding lines evaluated in replicated yield trials. APP accuracies reached r = 0.84 when training and test populations were genetically aligned, supporting routine breeding deployment. Training population composition emerged as a key determinant of prediction success, with breeding-oriented populations consistently outperforming broad germplasm collections for predicting true breeding lines. Check-based selection analyses showed that GS reliably reproduced phenotypic advancement decisions while eliminating 61-91% of low-performing lines, resulting in 48-78% reduction in field evaluation costs for a typical cohort of 300 lines. Marker subsampling analyses further indicated that moderate-density genotyping-by-sequencing panels ([~]2,500-3,000 SNPs) are sufficient to achieve stable prediction accuracies. Overall, these results demonstrate that GS for seed yield in flax is ready for routine integration into breeding programs, offering a practical pathway to reduce costs, accelerate breeding cycles, and enhance selection efficiency.
Qi, W.; Lim, Y.-W.; Patrignani, A.; Schlaepfer, P.; Bratus-Neuenschwander, A.; Grueter, S.; Chanez, C.; Rodde, N.; Prat, E.; Vautrin, S.; Fustier, M.-A.; Pratas, D.; Schlapbach, R.; Gruissem, W.
Show abstract
BackgroundCassava (Manihot esculenta) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding is limited because cassava has a highly heterozygous, repetitive and difficult to assemble genome. FindingsHere we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. Genome comparisons revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences despite high gene synteny, with most large structural variations being LTR-retrotransposon related. Allele-specific expression analysis of different tissues based on the haplotype-resolved transcriptome identified both stable and inconsistent alleles with imbalanced expression patterns, while most alleles expressed coordinately. Among tissue-specific differentially expressed transcripts, coordinately and biasedly regulated transcripts were functionally enriched for different biological processes. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding. ConclusionsThe haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava. The completely phased and annotated chromosome pairs will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.
Shaffer, W.; Papin, V.; Yadav, S.; Voss-Fels, K. P.; Hickey, L.; Hayes, B.; Dinglasan, E. G.
Show abstract
Quantitative trait loci (QTL) discovery studies on diversity panels or breeding populations typically use genome-wide association studies (GWAS) to estimate marker effects. For plant and animal breeding applications, researchers increasingly recognize the potential benefits of identifying superior haplotypes (markers in linkage disequilibrium; LD) rather than relying on single markers, as traditional approaches inefficiently account for cumulative signals from incomplete LD with QTL or split effects when multiple markers are in high LD with QTL. Using the genomic prediction framework, the local GEBV (localGEBV) method was developed in animal breeding and has been adopted in crop haplotype mapping studies; however, no study has thoroughly quantified the utility of this method or systematically compared outcomes to traditional GWAS approaches. Here, we characterized a strategy to group markers in chromosomal segments based on LD (haplotype blocks or haploblocks), computed localGEBV as a linear contrast of marker effects within each haploblock, and utilised the variance of localGEBV to enhance QTL discovery compared to traditional GWAS. Marker effects for localGEBV were estimated with ridge-regression best linear unbiased prediction (rrBLUP) and BayesR, with results compared to two common GWAS approaches. Using the barley row-type trait, we demonstrated that localGEBV improved QTL discovery and phenotypic prediction compared to single markers. Furthermore, localGEBV results were robust to the choice of prior marker assumptions and blocking parameters, enabling flexibility in fine or broad-scale QTL mapping. Overall, our findings establish localGEBV as a haplotype-based strategy capable of leveraging localized genomic effects to improve QTL discovery and, potentially, genomic selection.
Vourlaki, I.-T.; Ramos-Onsins, S. E.; Perez-Enciso, M.; Castanera, R.
Show abstract
Structural variants (SVs) such as deletions, inversions, duplications, and Transposable Element (TE) Insertion Polymorphisms (TIPs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider Single Nucleotide Polymorphisms (SNPs) as the only type of genetic marker. Here, we used rice to investigate whether combining the structural and nucleotide genome-wide variation can improve prediction ability of traits when compared to using only SNPs. Moreover, we also examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models, which have been widely applied in genomic prediction. Specifically, the performance of BayesC and a Bayesian Reproducible Kernel Hilbert space regressions were compared to two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network. We further explore their prediction ability by using various marker input strategies and found that exploiting structural and nucleotide variation improves prediction ability on complex traits in rice. Also, DL models outperformed Bayesian models in 75% of the studied cases. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models.
Bertolini, E.; Manjunath, M.; Ge, W.; Murphy, M. D.; Inaoka, M.; Fliege, C.; Eveland, A. L.; Lipka, A. E.
Show abstract
Plant architecture is a major determinant of planting density, which enhances productivity potential for crops per unit area. Genomic prediction is well-positioned to expedite genetic gain of plant architecture traits since they are typically highly heritable. Additionally, the adaptation of genomic prediction models to query predictive abilities of markers tagging certain genomic regions could shed light on the genetic architecture of these traits. Here, we leveraged transcriptional networks from a prior study that contextually described developmental progression during tassel and leaf organogenesis in maize (Z. mays) to inform genomic prediction models for architecture traits. Since these developmental processes underlie tassel branching and leaf angle, two important agronomic architecture traits, we tested whether genes prioritized from these networks quantitatively contribute to the genetic architecture of these traits. We used genomic prediction models to evaluate the ability of markers in the vicinity of prioritized network genes to predict breeding values of tassel branching and leaf angle traits for two diversity panels in maize, and diversity panels from sorghum (S. bicolor) and rice (O. sativa). Predictive abilities of markers near these prioritized network genes were similar to those using whole-genome marker sets. Notably, markers near highly connected transcription factors from core network motifs in maize yielded predictive abilities that were significantly greater than expected by chance in not only maize but also closely related sorghum. We expect that these highly connected regulators are key drivers of architectural variation that are conserved across closely related cereal crop species. Article summaryWe used an approach typically used for breeding to infer the contributions of biological gene networks to plant architectural traits. We found that markers near genes belonging to smaller, specialized gene networks from maize could predict breeding values of leaf angle better than expected by chance for both maize and sorghum.
Villwock, S. S.; Parkes, E. Y.; Nkouaya Mbanjo, E. G.; Rabbi, I.; Jannink, J.-L.
Show abstract
Cassava breeders aim to increase the provitamin A carotenoid content of storage roots to help combat vitamin A deficiency in sub-Saharan Africa, but a negative genetic correlation between total carotenoid (TC) and dry matter (DM) contents hinders breeding efforts. Genetic linkage between a major-effect variant in the phytoene synthase 2 (PSY2) gene and nearby candidate gene(s) has been thought to drive this correlation. Evidence from molecular experiments, however, suggest there may be a metabolic relationship between TC and DM, which we predicted would create genome-wide mediated pleiotropy. Bivariate genome-wide associations were used to test the hypothesis of pleiotropy and examine the genetic architecture of the negative covariance between TC and DM. A population of 378 accessions in the yellow-fleshed cassava breeding program at the International Institute of Tropical Agriculture (IITA) in Ibadan, Nigeria was genotyped with DArTseqLD. TC measured by iCheck spectrometer and DM data were available from field trials over ten years across three locations in Nigeria. Mixed linear models controlling for the previously-identified PSY2 causal variant were used to identify multiple new quantitative trait loci (QTL) jointly associated with both traits. The majority of 17 jointly-associated loci identified at a relaxed significance threshold affected TC and DM in opposite directions, although this pattern did not reach statistical significance in a binomial test. Even after accounting for the effects of these 17 loci as covariates, there was significantly negative polygenic covariance between TC and DM remaining. These findings support the hypothesis that mediated pleiotropy rather than genetic linkage drives the negative genetic correlation between TC and DM in cassava and demonstrate a new application of multivariate GWAS for interrogating the genetic architecture of correlated traits. Plain language summaryIncreasing provitamin A in cassava roots has reduced their dry matter content, making vitamin-enriched cassava varieties less desirable. This study used multi-trait models to identify shared genetic factors, most of which had opposing effects on the two traits. The negative relationship was distributed across the genome, suggesting an inherent physiological trade-off. These findings will guide breeders in developing selection strategies for vitamin-enriched cassava and other starchy crops. More broadly, this study demonstrates the use of multi-trait associations to help distinguish whether traits are associated due to separate, nearby genes (genetic linkage) or if the same genes affect multiple traits (pleiotropy).
Ramasubramanian, V.; Beavis, W. D.
Show abstract
Herein we report the impacts of applying five selection methods across 40 cycles of recurrent selection and identify interactions among factors that affect genetic responses in sets of simulated families of recombinant inbred lines derived from 21 homozygous soybean lines. Our use of recurrence equation to model response from recurrent selection allowed us to estimate the half-lives, asymptotic limits to recurrent selection for purposes of assessing the rates of response and future genetic potential of populations under selection. The simulated factors include selection methods, training sets, and selection intensity that are under the control of the plant breeder as well as genetic architecture and heritability. A factorial design to examine and analyze the main and interaction effects of these factors showed that both the rates of genetic improvement in the early cycles and limits to genetic improvement in the later cycles are significantly affected by interactions among all factors. Some consistent trends are that genomic selection methods provide greater initial rates of genetic improvement (per cycle) than phenotypic selection, but phenotypic selection provides the greatest long term responses in these closed genotypic systems. Model updating with training sets consisting of data from prior cycles of selection significantly improved prediction accuracy and genetic response with three parametric genomic prediction models. Ridge Regression, if updated with training sets consisting of data from prior cycles, achieved better rates of response than BayesB and Bayes LASSO models. A Support Vector Machine method, with a radial basis kernel, had the worst estimated prediction accuracies and the least long term genetic response. Application of genomic selection in a closed breeding population of a self-pollinated crop such as soybean will need to consider the impact of these factors on trade-offs between short term gains and conserving useful genetic diversity in the context of the goals for the breeding program.
Halpin-McCormick, A.; Campbell, Q.; Negrao, S.; Morrell, P. L.; Hubner, S.; Neyhart, J.; Kantar, M. B.
Show abstract
The genetic basis of adaptation is a fundamental question in evolutionary genetics. Environmental association analysis (EAA) and various allele frequency comparisons in genomic environmental association (GEA) have become standard approaches for investigating the genetic basis of adaptation to natural environments. While these analyses provide insight into local adaptation, they have not been widely adopted in breeding or conservation programs. This may be attributable to the difficulty in identifying the best individuals for transplantation/relocation in conservation efforts or identification of the best parents in breeding programs. To explore the use of EAA and GEA for future breeding programs, we used a cereal crop - barley (Hordeum vulgare L.) as our case-study species due to its wide adaptability to different environments and agro-ecologies, ranging from marginal and low input fields to high-productive farms. Here, we use publicly available data to conduct environmental genomic selection (EGS) on 753 landrace barley accessions using a mini-core of 31 landrace accessions and a de-novo core of 100 as the training populations. Environmental genomic selection is to environmental association analysis (EAA) what genomic selection is to genome-wide association studies (GWAS). Since local adaptation to the environment is polygenic, a whole-genome approach is likely to be more accurate for selecting for environmental adaptation. Here we show distinct genetic background and population differences and how an integrative approach coupling environmental genomic selection and species distribution modelling can help identify key parents for breeding for adaptation to specific environmental variables and geographies to minimize linkage drag.
Cordoba Novoa, H. A.; Hoyos-Villegas, V.
Show abstract
The study of mutations is fundamental to understanding evolution, domestication, and genetics. Characterizing mutations has the potential to accelerate breeding programs through selection and purging of deleterious mutations (DelMut). Here, we investigated how predicting DelMut in breeding populations can improve genomic prediction (GP) and inform strategies to increase the rate of genetic gain. DelMut were annotated in three independent common bean populations using a previously developed random forest (RF) model incorporating phylogenetic and protein information. Deleterious scores from the RF model were mostly around 0.25, with the top 1% (highly DelMut) of variants scoring between 0.78 - 0.82 among populations. All populations showed variation in the number of highly DelMut per line (max. 13 - 197) and in genetic load. We assessed the impact of incorporating a priori information for variant prioritization and weighting based on predicted deleteriousness in GP models for yield and flowering time. Stochastic simulations were conducted to evaluate how different mating schemes based variable numbers of DelMut per parent affect genetic gain. Variants with higher predicted scores had significantly different effect distributions compared to random or lower-scored markers. Yield predictions were 4.47-12.3% more accurate when markers were weighted by effect and deleterious score; no consistent improvement was observed for flowering time. Simulated breeding cycles showed that selecting parents with fewer highly DelMut consistently increases the rate of genetic gain. These results highlight the potential of DelMut information for variant prioritization and the optimization of common bean breeding programs. The approaches we developed can be assessed in other species to improve the efficacy of crop improvement. Key messages- Predicted deleterious mutations have different distributions of effects based on population composition. - Variant prioritization and differential weighing of markers based on effects and deleterious scores can improve the prediction of yield. - Favoring mating schemes between parents with fewer highly deleterious mutations can increase the rate of genetic gain.
Della Coletta, R.; Fernandes, S.; Monnahan, P.; Mikel, M.; Bohn, M. O.; Lipka, A. E.; Hirsch, C.
Show abstract
Breeders commonly use genetic markers to predict the performance of untested individuals as a way to improve the efficiency of breeding programs. These genomic prediction models have almost exclusively used single nucleotide polymorphisms (SNPs) as their source of genetic information, even though other types of markers exist, such as structural variants (SVs). Given that SVs are associated with environmental adaptation and not all of them are in linkage disequilibrium to SNPs, SVs have the potential to bring additional information to multi-environment prediction models that are not captured by SNPs alone. Here, we evaluated different marker types (SNPs and/or SVs) on prediction accuracy across a range of genetic architectures for simulated traits across multiple environments. Our results show that SVs can improve prediction accuracy by up to 19%, but it is highly dependent on the genetic architecture of the trait. Differences in prediction accuracy across marker types were more pronounced for traits with high heritability, high number of QTLs, and SVs as causative variants. In these scenarios, using SV markers resulted in better prediction accuracies than SNP markers, especially when predicting untested genotypes across environments, likely due to more predictors being in linkage disequilibrium with causative variants. The simulations revealed little impact of different effect sizes between SNPs and SVs as causative variants on prediction accuracy. This study demonstrates the importance of knowing the genetic architecture of a trait in deciding what markers and marker types to use in large scale genomic prediction modeling in a breeding program. Key messageWe demonstrate potential for improved multi-environment genomic prediction accuracy using structural variant markers. However, the degree of observed improvement is highly dependent on the genetic architecture of the trait.
Grubben, J.; Bijsterbosch, G.; Aktürk, B.; Visser, R. G. F.; Schouten, H.
Show abstract
Despite the success of CRISPR/Cas9 in inducing DNA double-strand breaks (DSBs) for genome editing, achieving targeted recombination in somatic cells remains challenging, particularly at recombination cold spots like the Tomato Mosaic Virus (ToMV) resistance locus in Solanum lycopersicum. We investigated the potential of CRISPR/Cas9-induced targeted recombination in somatic cells to overcome linkage drag surrounding the ToMV locus. We employed two strategies: first, inducing DSBs in both alleles of F1 tomato seedlings to promote non-homologous end joining (NHEJ) and homology-directed repair (HDR); second, targeting a single allele in a heterozygous background to induce HDR in seedlings. CRISPR/Cas9 activity was confirmed in F seedlings by detecting NHEJ-mediated mutations at the target sites in ToMV. We developed a bioinformatics pipeline to identify targeted recombinants by analyzing single nucleotide polymorphisms (SNPs) between parental haplotypes, allowing precise tracking of SNP variations. A two-dimensional pooling strategy was employed to distinguish genuine recombination events from PCR artifacts. Despite these advances and the active CRISPR/Cas9 system in F1 progeny, no increase in recombination frequency was observed compared to wild-type plants. We extended our research to protoplasts to assess whether CRISPR/Cas9 could induce targeted recombination under different cellular conditions at the same locus. Consistent with our findings in F1 plants, we observed no increase in recombinant patterns compared to wild-type controls in protoplasts. Our findings suggest that CRISPR/Cas9-induced DSBs are insufficient to break the genetic linkage at the ToMV locus on chromosome 9 in recombination cold spots within somatic cells. Article SummaryThis research targets plant biologists and geneticists interested in enhancing plant breeding techniques. The study used CRISPR/Cas9 technology to induce DNA breaks in tomato plants. It specifically targeted the Tomato Mosaic Virus (ToMV) resistance gene, which resists natural recombination. The aim was to induce genetic recombination via CRISPR/Cas9. The highly active CRISPR/Cas9 system did not increase the expected genetic changes, indicating challenges in achieving targeted recombination. These findings highlight the challenges in breaking genetic linkages in specific genome regions using current CRISPR methods. These findings are relevant for developing techniques for targeted recombination in plant breeding.
Labroo, M. R.; Endelman, J. B.; Gemenet, D. C.; Werner, C. R.; Gaynor, R. C.; Covarrubias-Pazaran, G. E.
Show abstract
To produce genetic gain, hybrid crop breeding can change the additive as well as dominance genetic value of populations, which can lead to utilization of heterosis. A common hybrid breeding strategy is reciprocal recurrent selection (RRS), in which parents of hybrids are typically recycled within pools based on general combining ability (GCA). However, the relative performance of RRS and other possible breeding strategies have not been thoroughly compared. RRS can have relatively increased costs and longer cycle lengths which reduce genetic gain, but these are sometimes outweighed by its ability to harness heterosis due to dominance and increase genetic gain. Here, we used stochastic simulation to compare gain per unit cost of various clonal breeding strategies with different amounts of population inbreeding depression and heterosis due to dominance, relative cycle lengths, time horizons, estimation methods, selection intensities, and ploidy levels. In diploids with phenotypic selection at high intensity, whether RRS was the optimal breeding strategy depended on the initial population heterosis. However, in diploids with rapid cycling genomic selection at high intensity, RRS was the optimal breeding strategy after 50 years over almost all amounts of initial population heterosis under the study assumptions. RRS required more population heterosis to outperform other strategies as its relative cycle length increased and as selection intensity decreased. Use of diploid fully inbred parents vs. outbred parents with RRS typically did not affect genetic gain. In autopolyploids, RRS typically was not beneficial regardless of the amount of population inbreeding depression. Key MessageReciprocal recurrent selection sometimes increases genetic gain per unit cost in clonal diploids with heterosis due to dominance, but it typically does not benefit autopolyploids.
El-Walid, M. Z.; Gault, C. M.; Costich, D. E.; Lepak, N. K.; Stitzer, M. C.; Giri, A.; Rees, E. R.; Budka, J. S.; Romay, M. C.; Buckler, E. S.; Hsu, S.-K.
Show abstract
This study investigates the genetic basis of freezing tolerance in Tripsacum dactyloides and related subspecies as a potential source of valuable traits for improving maize agriculture. Recognizing the significant economic losses in corn yields due to frost damage, we hypothesized that northern populations of T. dactyloides are enriched for freezing tolerance alleles. 40 diverse Tripsacum accessions were collected from natural populations and long-established field collections and used to generate F1 hybrids and open-pollinated F2 families. F2 seedlings were germinated then screened within a growth chamber for freezing tolerance by exposure to freezing temperatures. Seedlings were then phenotyped by tissue survival, and extremes were pooled to create tolerant and susceptible bulks. DNA sequencing was performed on founders, F1s, and tolerant/susceptible F2 bulks. To overcome challenges in traditional SNP calling in bulked samples, we developed a regression-based approach to estimate gamete frequencies and impute allele frequencies in pooled populations. The results showed genetic diversity among Tripsacum accessions, with divergence between northern and southern populations. We tracked segregation of alleles across genomic loci, and performed a joint bulk segregant analysis, identifying 7 QTLs significantly associated with freezing tolerance. These findings highlight potential loci for freezing tolerance that could inform genetic engineering of maize. Central HypothesisNorthern populations of Tripsacum dactyloides, a wild relative of maize, are enriched for freezing tolerance alleles which can be identified by mapping.